Complex Dtype Support for Hashmap Algos #36482

alimcmaster1 · 2020-09-19T18:58:54Z

fixes Functions that rely on hash tables are incorrect for complex numbers #17927 df.groupby on a column with complex numbers is broken #26475
tests added / passed
passes black pandas
passes git diff upstream/master -u -- "*.py" | flake8 --diff

Ref: #18009

Based on #27599

First set of tests for complex number handling + sensible results from functions that rely on hash tables.

Use generic object hashing for now.

@jbrockmendel you interested in reviewing?

Co-authored-by: Luca Ionescu <[email protected]>

…cu-mcmali

alimcmaster1 · 2020-09-19T19:02:55Z

pandas/tests/test_complex.py

@@ -0,0 +1,129 @@
+import numpy as np


Any better locations for this test file?

can you split the tests to the appropriate files: pandas/tests/series/methods/test_value_counts.py

for example

jbrockmendel · 2020-09-19T20:27:07Z

do we have a list of what algorithms this is used for? if they are all unique/factorize-like, we might be able to do a .view(floatlike) to avoid an object cast

WillAyd · 2020-09-21T22:16:43Z

This seems OK to me to start - could probably optimize later with things like .view(floatlike) though there may be some drawbacks to doing that as well

jbrockmendel · 2020-10-28T22:17:57Z

I'm finding in a mostly-unrelated branch that having actual support for complex (in particular complex128) would be tremendously helpful. Just gentle encouragement.

jreback

pls rebase as well

jreback · 2020-10-31T19:01:34Z

pandas/tests/test_complex.py

@@ -0,0 +1,129 @@
+import numpy as np


can you split the tests to the appropriate files: pandas/tests/series/methods/test_value_counts.py

for example

alimcmaster1 · 2020-10-31T23:30:27Z

Sure will take a look @jreback

alimcmaster1 · 2020-10-31T23:34:25Z

I'm finding in a mostly-unrelated branch that having actual support for complex (in particular complex128) would be tremendously helpful. Just gentle encouragement.

Sure which branch was this? This doesn’t offer complex128 support - but makes a lot of basic functional work. But I can perhaps look into that in a follow up @jbrockmendel

alimcmaster1 · 2020-11-26T20:05:32Z

fixing this up today.

alimcmaster1 · 2021-08-26T21:56:43Z

There are no test cases with not-a-number values, like np.nan, 1j*np.nan, np.nan+1j, 1j+np.nan and np.nan+1j*np.nan (best all at once). These are often tricky and are easily overlooked - thus always a good idea to have some unit tests with them.

Good idea added some testing around this in test_duplicates.py

Added testing for complex64 and complex128 in test_duplicates.py , test_reductions.py and test_value_counts.py

I think as a follow up we can address the inferred_type issue in complex indexes:

pd.Series([3, 2, 1], index=pd.Index([3j, 1 + 1j, 1])).index.inferred_type
Out[23]: 'mixed-integer'
(Would expect this to be "complex")

In [24]: pd.Series([3, 2, 1], index=pd.Index([3j, 1 + 1j, 1], dtype=np.complex128)).index.inferred_type
Out[24]: 'complex'

jreback · 2021-08-26T23:59:55Z

cc @realead if you can look

realead

@alimcmaster1 Please delete outdated comment, otherwise lgtm.

realead · 2021-08-29T08:46:28Z

pandas/tests/series/methods/test_value_counts.py

+        ],
+    )
+    def test_value_counts_complex_numbers(self, input_array, expected):
+        # Complex Index dtype is cast to object


Is this comment still valid? IIUC value_counts uses complex128/256 and not objects, see

pandas/pandas/_libs/hashtable_func_helper.pxi.in

Line 331 in e39ea30

cpdef value_count(ndarray[htfunc_t] values, bint dropna):

Dtype of the index will be objects, see below. Agree this probably needs fixing - I can create a follow up. As this is the same issue as your comment below refers too. #36482 (comment)

In [14]: pd.Series([1 + 1j, 1 + 1j, 1, 3j, 3j, 3j]).value_counts().index Out[14]: Index([3j, (1+1j), (1+0j)], dtype='object')

pls create a followon issue

realead · 2021-08-29T08:52:50Z

pandas/tests/test_algos.py

+        [
+            (
+                [1 + 1j, 0, 1, 1j, 1 + 2j, 1 + 2j],
+                np.array([(1 + 1j), 0j, (1 + 0j), 1j, (1 + 2j)], dtype=object),


Hmm, dtype=object here... I ask myself, whether we should have a unique-function with fused types, like we already do for other functions e.g. value_counts (

pandas/pandas/_libs/hashtable_func_helper.pxi.in

Line 331 in e39ea30

cpdef value_count(ndarray[htfunc_t] values, bint dropna):

)
What do you think @jbrockmendel? Probably should not be part of this PR though. It looks like dtype=object for factorize and Index are just consequences of unique returning objects.

@jbrockmendel - whenever you have time, think your eyes on this would be much appreciated.

yah i think ideally this should return a complex dtype (same for factorize above). OK for that to be a separate PR, can leave a comment on the test to that effect

Great have added comments to that effect - will create a follow up issue

pandas/tests/groupby/test_groupby.py

pandas/tests/indexes/multi/test_duplicates.py

pandas/tests/reductions/test_reductions.py

alimcmaster1 · 2021-09-02T23:20:23Z

restarting CI test failures unrelated.

Could not find conda environment: pandas-dev
You can list all discoverable environments with `conda info --envs`.

Error: Process completed with exit code 1.

All comments addressed this should be good.

alimcmaster1 · 2021-09-03T13:50:10Z

/azp run

azure-pipelines · 2021-09-03T13:50:21Z

Azure Pipelines successfully started running 1 pipeline(s).

jbrockmendel

LGTM

alimcmaster1 · 2021-09-03T20:11:11Z

LGTM

Appreciated the review :)

jreback

lgtm, thanks a lot @alimcmaster1

if you can create the followon (checkboxes if needed) would be great

alimcmaster1 and others added 14 commits January 3, 2020 01:38

Merge master

1c6b786

Co-authored-by: Luca Ionescu <[email protected]>

Fix test failures ignore FutureWarning

42a46d7

Filter warning correctly

8331d06

Fix imports

3ba4169

Merge remote-tracking branch 'remotes/upstream/master' into lucaiones…

8302589

…cu-mcmali

Add warning annotation

5068771

Remove unrequired annotation

8d65aa7

Merge remote-tracking branch 'remotes/upstream/master' into lucaiones…

8b7ac7d

…cu-mcmali

Merge remote-tracking branch 'upstream/master' into lucaionescu-mcmali

45c8237

Update docs

cb74fe3

Create deepsource.toml

b29404e

Commit Complex handling

f983f4f

run black

c2e4e82

Use pandas.testing

7c42495

alimcmaster1 added the Complex Complex Numbers label Sep 19, 2020

Use pandas.testing

41b1faf

alimcmaster1 commented Sep 19, 2020

View reviewed changes

pandas-dev deleted a comment from pep8speaks Sep 19, 2020

alimcmaster1 changed the title ~~ENH: Complex Dtype Support for Hashmap Algos~~ WIP: Complex Dtype Support for Hashmap Algos Sep 19, 2020

Clean ups

da53f38

github-actions bot added the Stale label Oct 22, 2020

jreback requested changes Oct 31, 2020

View reviewed changes

Merge remote-tracking branch 'upstream/master' into mcmali-complex

32262e7

Move test to sep files

f4932d9

Pep8

d1e00b7

pandas-dev deleted a comment from pep8speaks Aug 26, 2021

alimcmaster1 added 2 commits August 26, 2021 22:27

isort

df28514

More tests

6b4c10e

alimcmaster1 requested a review from jbrockmendel August 26, 2021 21:57

alimcmaster1 added this to the 1.4 milestone Aug 26, 2021

Add type info

9afed5f

realead suggested changes Aug 29, 2021

View reviewed changes

alimcmaster1 added 2 commits August 31, 2021 16:51

Merge Master

96d5a58

Fix whatsnew

e9a4ca2

alimcmaster1 requested a review from realead August 31, 2021 22:09

jbrockmendel reviewed Aug 31, 2021

View reviewed changes

pandas/tests/groupby/test_groupby.py Outdated Show resolved Hide resolved

jbrockmendel reviewed Aug 31, 2021

View reviewed changes

pandas/tests/indexes/multi/test_duplicates.py Show resolved Hide resolved

jbrockmendel reviewed Aug 31, 2021

View reviewed changes

pandas/tests/reductions/test_reductions.py Outdated Show resolved Hide resolved

alimcmaster1 added 3 commits September 1, 2021 20:55

Merge remote-tracking branch 'upstream/master' into mcmali-complex

e882b4e

Updates as per comments

6bf72a0

Fix tests

e53417d

alimcmaster1 requested a review from jbrockmendel September 1, 2021 21:42

Merge Master

fdf45b1

jbrockmendel approved these changes Sep 3, 2021

View reviewed changes

alimcmaster1 requested a review from jreback September 3, 2021 20:39

jreback approved these changes Sep 4, 2021

View reviewed changes

jreback merged commit d08a792 into pandas-dev:master Sep 4, 2021

feefladder pushed a commit to feefladder/pandas that referenced this pull request Sep 7, 2021

Complex Dtype Support for Hashmap Algos (pandas-dev#36482)

b78791c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complex Dtype Support for Hashmap Algos #36482

Complex Dtype Support for Hashmap Algos #36482

alimcmaster1 commented Sep 19, 2020 •

edited

Loading

alimcmaster1 Sep 19, 2020

jreback Oct 31, 2020

jbrockmendel commented Sep 19, 2020

WillAyd commented Sep 21, 2020

jbrockmendel commented Oct 28, 2020

jreback left a comment

jreback Oct 31, 2020

alimcmaster1 commented Oct 31, 2020

alimcmaster1 commented Oct 31, 2020

alimcmaster1 commented Nov 26, 2020

alimcmaster1 commented Aug 26, 2021

jreback commented Aug 26, 2021

realead left a comment

realead Aug 29, 2021

alimcmaster1 Aug 31, 2021

jreback Sep 4, 2021

realead Aug 29, 2021

alimcmaster1 Aug 31, 2021

jbrockmendel Aug 31, 2021

alimcmaster1 Sep 1, 2021

alimcmaster1 commented Sep 2, 2021 •

edited

Loading

alimcmaster1 commented Sep 3, 2021

azure-pipelines bot commented Sep 3, 2021

jbrockmendel left a comment

alimcmaster1 commented Sep 3, 2021

jreback left a comment

Complex Dtype Support for Hashmap Algos #36482

Complex Dtype Support for Hashmap Algos #36482

Conversation

alimcmaster1 commented Sep 19, 2020 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jbrockmendel commented Sep 19, 2020

WillAyd commented Sep 21, 2020

jbrockmendel commented Oct 28, 2020

jreback left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimcmaster1 commented Oct 31, 2020

alimcmaster1 commented Oct 31, 2020

alimcmaster1 commented Nov 26, 2020

alimcmaster1 commented Aug 26, 2021

jreback commented Aug 26, 2021

realead left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

alimcmaster1 commented Sep 2, 2021 • edited Loading

alimcmaster1 commented Sep 3, 2021

azure-pipelines bot commented Sep 3, 2021

jbrockmendel left a comment

Choose a reason for hiding this comment

alimcmaster1 commented Sep 3, 2021

jreback left a comment

Choose a reason for hiding this comment

alimcmaster1 commented Sep 19, 2020 •

edited

Loading

alimcmaster1 commented Sep 2, 2021 •

edited

Loading